Skip to main content

Linear Regression

Regression analysis is a statistical method used for predicting numerical values based on input features. Common applications include predicting home prices, stock values, patient hospital stays, and retail sales forecasts.

Types of Regression Problems

  • Regression Problems: Concerned with predicting continuous numerical values.
  • Classification Problems: Focused on predicting categorical outcomes.

Linear Regression

Definition

Linear regression is a foundational algorithm in regression analysis that assumes a linear relationship between input features and the target variable.

Applications

  • Predicting housing prices based on area and age.
  • Estimating stock prices using historical data.
  • Forecasting patient hospital stays.

Concepts

Dataset Terminology

  • Training Dataset: The subset of data used to fit the model.
  • Example: A single instance or row in the dataset.
  • Label (Target): The outcome variable the model aims to predict.
  • Features: The input variables used to predict the label.

Linear Regression Model

Linear Equation

The linear regression model expresses the relationship between features and the label using the following equation:

price=wareaarea+wageage+b\text{price} = w_{\text{area}} \cdot \text{area} + w_{\text{age}} \cdot \text{age} + b
  • wareaw_{\text{area}} and wagew_{\text{age}}: Weights assigned to each feature.
  • bb: Bias term (intercept).

Matrix Formulation

For models with multiple features and data points, linear regression can be represented using vectors and matrices:

y^=wx+b\hat{y} = \mathbf{w}^\top \mathbf{x} + b
  • x\mathbf{x}: Feature vector.
  • w\mathbf{w}: Weight vector.

For multiple observations, the equation extends to:

y^=Xw+b\hat{\mathbf{y}} = \mathbf{X} \mathbf{w} + b
  • X\mathbf{X}: Design matrix containing all feature vectors.
  • y^\hat{\mathbf{y}}: Vector of predicted values.

Loss Function

Definition

The loss function quantifies the difference between the model's predictions and the actual data.

Mean Squared Error (MSE)

A commonly used loss function in regression tasks:

L(w,b)=1ni=1n(wx(i)+by(i))2L(\mathbf{w}, b) = \frac{1}{n} \sum_{i=1}^n \left(\mathbf{w}^\top \mathbf{x}^{(i)} + b - y^{(i)}\right)^2
  • nn: Number of data points.
  • y(i)y^{(i)}: Actual target value for the ithi^{th} example.

Optimization

Analytic Solution

Direct computation of the optimal weights using matrix operations, assuming the design matrix X\mathbf{X} has full rank.

Gradient Descent

An iterative optimization technique to minimize the loss function by updating weights in the opposite direction of the gradient.

Minibatch Stochastic Gradient Descent (SGD)

  • Minibatch SGD: Utilizes small, random subsets of data (minibatches) for more frequent weight updates.
  • Advantages: Balances computational efficiency and convergence quality.

Practical Example

PyTorch Module Implementation

This example demonstrates setting up a complete PyTorch module using an object-oriented approach, integrating a model, data module, and trainer.

Module Definition

import torch
from torch import nn, optim
from torch.utils.data import DataLoader, TensorDataset
from torchvision import transforms

class Module(nn.Module):
"""The base class for models."""
def __init__(self):
super().__init__()
# Define the layers of the model
self.net = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10)
)

def forward(self, X):
"""Forward pass through the network."""
return self.net(X)

def training_step(self, batch):
"""Compute the loss for a batch of data."""
inputs, targets = batch
outputs = self(inputs)
loss = nn.CrossEntropyLoss()(outputs, targets)
return loss

def configure_optimizers(self):
"""Set up optimizers."""
return optim.SGD(self.parameters(), lr=0.1)

Data Module

class DataModule:
"""The base class for data handling."""
def __init__(self, batch_size=64):
# Transformations applied to each data item
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,)) # MNIST mean and std
])

# Dummy data: 1000 examples, 784 features each (28x28 images flattened)
features = torch.rand(1000, 784)
labels = torch.randint(0, 10, (1000,))

dataset = TensorDataset(features, labels)
self.dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

def get_dataloader(self):
return self.dataloader

Trainer

class Trainer:
"""The base class for training models."""
def __init__(self, model, data_module, max_epochs=10):
self.model = model
self.data_module = data_module
self.max_epochs = max_epochs
self.optimizer = model.configure_optimizers()

def fit(self):
"""Execute the training loop."""
for epoch in range(self.max_epochs):
for batch in self.data_module.get_dataloader():
loss = self.model.training_step(batch)
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
print(f'Epoch {epoch}, Loss: {loss.item()}')

Execution

# Create instances of the model, data module, and trainer
model = Module()
data_module = DataModule()
trainer = Trainer(model, data_module)

# Start training
trainer.fit()